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(54) Utte: THE UTILISATION OF MULTI-LINGUAL NAMES ON THE INTERNET 
(57) Abstract 

A method for providing for multi-lingual names for use on the Internet, related networks, and computers is disclosed, the method 
comprising the steps of: forming an initial multi-lingual name in a multi-lingual format; mapping the multi-lingual name to a corresponding 
coded name in a reversible manner, the coded name comprising a restricted subset of the ASCII character set; utilising the corresponding 
coded name on the Internet, related networks and computers in place of the multi-lingual name. Preferably the mapping step further 
comprises adding a predetermined pseudo-root name server to the corresponding coded name. TTie mapping can include converting the 
multi-lingual name to a corresponding hexadecimal name and representing the hexadecimal name in an ASCII form. The corresponding 
coded name can be divided into a series of labels with each label having a predetermined portion comprising a control code for the label. 
The preferred embodiment is ideally utilised in existing or future Internet applications, utilities, resources or services. Existing uses include, 
but are not limited to: web browsers, editors, e-mail, ness, telnet, ftp, gopher, WAIS, whois, nslookup, trace, ping, finger, rpc, cgi programs, 
usemames, and databases. When performing queries the name server may respond with additional records for binary or sub-ASCII forms 
that match, or are variations of, the queried name. For example, if there are minor spelling errors, if they differ only in case, or their base 
equivalent characters are the same. 
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The Utilisation of Multi-Lingual Names on the Internet 
Field of the Invention 

The present invention relates to the utilisation of 
multilingual names on the Internet, related networks and 
computer systems. Multilingual names include domain names, 
user names, file names, email addresses, newsgroups and 
Universal Resource Locators (URLs) . 
Background of the Invention 

In recent times, the internet has undergone an 
explosive growth in utilisation. The original formation of 
the internet was based around the utilisation of English 
language character formats and as such, such formats 
dominate domain name structures, URLs etc. A large 
proportion of the world's population does not utilise the 
English language as its primary language of communication. 
Hence, there is a general need for other language's 
character based formats, for example: Chinese, Arabic, etc. 
Unfortunately, due to backward compatibility problems, 
these other language formats have received only restricted 
utilisation on the Internet. It is desired to expand the 
use of other languages to fundamental components of the 
internet being domain names, user names, file names, email 
addresses, newsgroups and Universal Resource Locators 
(URLs) . 

A glossary is provided, along with a brief 
Introduction to the Domain Name System (DNS) , and 
references to the most relevant Request for Comments 
(RFCs).. 

Summary of the Invention 

It is an object of the present invention to provide 
for an extended use of multilingual names on the internet, 
related networks and computer systems. 

In accordance with a first aspect of the present 
invention, there is provided a method for providing for 
multilingual names for utilisation on the Internet, the 
method comprising the steps of: forming an initial 
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multilingual name in a multilingual format; mapping the 
multilingual name to a corresponding coded name in a 
reversible manner, the coded name comprising a restricted 
subset of the ASCII character set; and utilising the 
5 corresponding coded name (on the Internet) in place of the 
multilingual name. 

Preferably the mapping step further comprises adding a 
predetermined pseudo-root name server to the corresponding 
coded name, particularly when the name is a domain name, or 

10 email address. The mapping can include converting the 
multilingual name to a corresponding Hexadecimal coded name 
and representing the Hexadecimal coded name in an ASCII 
form. The corresponding coded name can be divided into a 
series of labels with each label having a predetermined 

15 portion comprising a control code for the label. 

The preferred embodiment is ideally utilised in 
existing or future internet applications, utilities, 
resources or services. Existing applications include, but 
are not limited to: web browsers, editors, e.mail, news, 

20 telnet, ftp, gopher, WAIS, whois, nslookup, trace, ping, 
finger, rpc, cgi programs, file names, usernames, and 
databases . 

Brief Description of the Drawings 

Notwithstanding any other forms which may fall within the 
25 scope of the present invention, preferred forms of the invention 
will now be described, by way of example only, with reference to 
the accompanying drawings in which: 

Fig. 1 illustrates the steps in the method of the preferred 
embodiment . 

30 Descripti on of Preferred and Other Embodiments 

The preferred embodiment discloses processes that 
allow: 

1. Multilingual names to be represented in limited 
subsets of the ASCII character set, 
35 2 - Names which are compatible with existing software 

- applications and databases, thus requiring no change to 
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existing ^^Etware. 

3. New Software (and changes to existing software) 
to be made that incorporate the processes described, which 
may replace, or work with existing software. 
5 Using the processes described, multilingual domain 

names can be utilised, without changes to existing resolver 
or name server software. 

The preferred embodiment is fully backwards compatible 
with existing systems and does not require any 
10 changes to existing software used for processing domain 
names, user names, file names, email addresses, newsgroups 
and Universal Resource Locators (URLs) . 

Existing programs don't need to be changed, however it 
is expected they will progressively be adapted to make it 
15 easy for non-English alphabets to be read and typed in the 
form of domain names, email addresses, etc. 

The preferred embodiment allows multilingual names to 
be written in many languages, even a mix, and then 
converted to fit into a subset of ASCII characters. A 
20 converting program is needed to do the conversion and 
display of Multilingual names. 

By way of definition any program that converts between 
representations of names (multilingual name .f -> coded name) 
is called a converter - this may include resolvers, name 
25 servers, web browsers, and any program that carries out the 
converting process. 

The preferred embodiment proposes, and address the 
issues of 

1- ' General Methods that allow a variety and mix of 
30 representations of multilingual names; 

2 - Substitution of Characters for special words, or 
base equivalent characters; 

3 - Control Codes that indicate the encoding used, 
and splitting of names that are too long; 

35 a - UCS-2 as Hex in ASCII which is a particular 

encoding and splitting method; 
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30 




4 " Pseudo-Root Names attached to hierarchical names, 
to indicate an alternative hierarchy; 

5 - Ap plication to Names of particular types: 
strings, newsgroups, domain names, email addresses, and 

5 URLs . 

6 - Forms of Implementation covering software and 

interfaces. 

Conventions 

The following conventions are used in the following 
10 examples. 

<> ASCII characters are in angle brackets eg. <Jason> 
[ ] UCS-2 characters are in square brackets eg. [Jason] 

Names with components or a hierarchy have usually been 
written with separators between the components such as the 
15 at symbol *@' , dot or slash V. 

eg. news: "comp. law. patents"; 

email : "Jason@OneAccount .net"; 
URL: "http://www.OneAccount.net/login.cgi". 
Since this invention allows these symbols to be used 
within components, these symbols only act as separators 
outside of brackets, 
eg . "< Jason>@<0neAccount > . <net>" ; 

"<http : >//<www> . <OneAccount> . <net>/<login . cgi>" . 
General Methods 

A multilingual name may be a simple string, or may 
comprise a number of components that require parsing and 
interpretation,- as part of conversion to a coded name. 
Components of names may be hierarchically organised from 
left to right or right to left and may " contain other 
non-hierarchical components. 

Implementors of converters have the choice of 
converting the entire string, or converting each component, 
since they are likely to be specialists in their target 
language market. 

35 Converting is at least the reversible transformation 

of characters from a multilingual set to ASCII, and may 
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comprise parsing of components, substitution of characters, 
encoding, splitting, control codes indicating the encoding 
or splitting, or attachment of pseudo-root names. 

Parsing of multilingual components involves 
identification of separators. Each separator can now be 
represented by several characters from several languages. 
The user may even be given the option of what symbols they 
would like to use as separator characters. 

eg. instead of >x @", it is possible to choose " at ", so 
that a corresponding example email address would be "Jason 
at OneAccount.net". 
Substitution of Characters 
Special Words 

Parts of a multilingual name may have special meaning, 
15 for instance, the file name extension, or protocol to use. 

A Japanese language user may prefer to see and use the 
Japanese characters for ".exe", or "http:" . 

Converters may substitute ASCII characters in place of the 
synonymous multilingual characters. 

20 Base Equivalent Characters 

Sometimes, it is desirable to ignore the case of 
characters in English, such as for searching or matching 
names. We call this being case insensitive. To make 
comparisons, it is usual to force all the characters to 

25 upper or lower case. Other languages' alphabets have 
different rules. For instance, Greek has three forms of 
Sigma, one only for use at the end of a word, when the word 
is lowercase. 

Different kinds of comparisons may be done for each 
alphabet. We therefore define a sets of characters that are 
equivalent to each other for purposes of comparison. From 
each set, one character is said to be the Base Equivalent 
Character. When making that comparison, equivalent 
characters are forced to the base equivalent character. 

For Case Insensitive comparisons on UCS-2, it is 
preferred that the base character be the earliest character 



30 



35 
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of each set in ISO1064 6 order, from withiTT the language. 
This forces Latin, Greek and Cyrillic to uppercase and 
Hiragana and Katakana to lowercase. So, for instance, Greek 
lowercase alpha is substituted with Greek uppercase alpha, 
but not with Latin "A", nor Cyrillic "aleph". 

Another type of comparison could be character shape. 
The letters "IBM" could be Latin, Greek or Cyrillic. A 
language insensitive search could force them all to Latin. 
Control Codes 

Control codes can be attached to a coded name, or to 
each component of a coded name, to indicate the type of 
encoding, and the split sequence. A particular example is 
UCS-2 as Hex in ASCII. 
Method of Encoding 
15 When a multilingual name is converted into a coded 

name, control codes can be attached to the coded name to 
indicate the method of encoding. 
Split Sequence 

If a component of a multilingual name is too long when 
converted to fit into a single component of a coded name, 
it may be split across several components of a coded name. 
Control codes attached to each component of the coded name 
can indicate which part of a multilingual component it 
belongs to, ie its order in a split component. 

This is particularly useful for hierarchical names 
with limits on the length of components, such as domain 
names. 

UCS-2 as Hex in ASCII 

UCS-2 as Hex in ASCII is an encoding of multilingual 
30 names. Its 3 octet control code is <X-n> where n is an 
ASCII number from <1> to <9>, when it comprises a split 
component, and <0> when the component is not split. The 
control code is prepended to the coded component. 

Each UCS-2 character becomes four ASCII characters in 
the ranges <0>-<9>, <A>-<F>; representing the value of the 
UCS-2 character in Hexadecimal. 



20 



25 
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An ex^ple of UCS-2 to ASCII, not splaT. 
[Jason] -> <X-0004A00610073006F006E> 
An example of split ASCII to UCS-2 
<X-30065><X-2006E><X-1004F> -> [One] 
5 Pseudo-Root Names 

A pseudo-root name is a predetermined name attached to 
coded hierarchical names, such as newsgroups and domain 
names, so that they become part of a predetermined 
hierarchy. By adding the pseudo-root name to all coded 
10 names, that branch of the hierarchy effectively becomes the 
root of a pseudo-hierarchy. 

This has several useful properties: 

1. Separation of Names 

Coded names won't be mixed up with normal ASCII names, so 
15 it is less confusing for users. 

2. Separation of Risk 

Technical, business or political changes to the 
pseudo-root hierarchy names, won't adversely affect the 
real root or other branches. 
20 3. Separation of processing load 

In hierarchical distributed systems, such as DNS, the 
processing load arising from multilingual names, is 
allocated to computers serving the pseudo-hierarchy. 

4 . Specialisation 

25 Pseudo-root hierarchies can specialise in a particular 

type of encoding or language. Different converters can 
attach different pseudo-root names, meaning the converter 
programs and hierarchies can specialise. 

5 . Politics 

30 A pseudo-root can be made in a part of the hierarchy 

in which control is exercised. 

It is recommended that all coded domain names are 

subdomains of "X-X.NET", and coded newsgroups created under 

"alt .x-". 
35 Application to Names 
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Many combinations of processes may be applied to 

various kinds of names: 
Strings 

Simple multilingual strings, such as user names, might 
> merely be converted to a coded form with a control code 
attached indicating the encoding method, such as X-0. 

Strings with components, such as file names, might 
also have special words substituted with synonymous 
characters. For instance, a Japanese file name is suffixed 
by Japanese characters that indicate it is an executable 
program, these characters may be replaced by the. file name 
extension M .exe". 
Newsgroups 

Newsgroups are also known as Internet News, and 
15 Usenet. 

Coded names can be used as the names of newsgroups, 
and displayed to users as multilingual newsgroup names. 

To name newsgroups in multilingual characters, with an 
example for a newsgroup about patent law in English. 
20 [Law, Patent] (English language) 

1. Substitute with base equivalent characters. 
Substitute ISO language code for language. 
<EN>. [LAW] . [PATENT] 

2. Convert UCS-2 to ASCII and add control codes. 
25 <EN>.<X-0004C00410057>.<X-00050004100540045004E0054> 

3. Add pseudo-root for multilingual news hierarchy. 
<ALT>.<X->.<EN>.<X-0004C00410057>. 
<X-00050004100540045004E0054> 

4. Present the normal ASCII name of the newsgroup. 
30 "ALT.X-.EN.X-0004C00410057.X-00050004100540045004E0054" 

It is recommended that since some alphabets are shared 
by many languages, that the top level newsgroup names be 
the 2 letter ISO language codes. 
Domain Names 

35 a brief introduction to the domain name system is 

supplied later. For details see the referenced RFCs. 
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DomanT Names are hierarchical names commonly used to 
identify organisations on the internet.. RFC1035 specifies 
the presentation of domain names as domain labels separated 
by \' dots, with the highest level domain label on the 
5 right, and subdomains proceeding to the left. For example 
in "www. example, com. au.", "au" is the top level label for 
Australia, '"com" is the second level label for commercial 
enterprises, "example' 7 is the third level label - the name 
of the enterprise, and "www" is the fourth level label 
10 identifying a computer in the enterprise. This is the 
traditional way of writing domain names. 

Instead, the presentation of domain names is left to 
implementors of converters. The implementors, or even the 
users, may select appropriate separator, quote, and escape 
15 symbols, along with special words, and the direction of the 
hierarchy (left to right, right to left, etc.). Each 
domain label could even be entered in separate text fields, 
eliminating the need for separate characters. However, it 
is often easier to write and type a domain name with 
20 separating characters. 

The domain names system is concerned with the format 
of binary data between resolvers and name servers. Due to 
compatibility issues, only a limited subset of ASCII is 
used in labels, the characters 'A'-'Z', *a'- x z', % 0'-*9' 
25 and . It is an object of the preferred embodiment to 
allow multilingual domain names to be represented in this 
subset of ASCII. 

A process for representing multilingual domain names 
can be shown in Fig. 1. 
30 1. Parsing, and Substitution of Special Words 1; 

2. Substitution of Base Equivalent Characters 2; 

3. Encoding, Splitting and Control codes 3; 

4. Adding pseudo-root domain name 4; 

5. Presenting coded form of names 5; 
35 1. Parsing, and Substitution of Special Words 
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Converters may accept domain name laSUs in a variety 
of ways, such as selection from a list of countries, or 
typing a partial domain name into a text field. Converters 
which allow labels to be typed together into one field need 
5 to parse the parts of the domain name into labels. 
Separator, guote, and escape characters may be defined by 
implementors of the converter, or be left to the user's 
choice. 

Special words may be substituted for selected or typed 
10 labels. For instance, replacing the Arabic label for 
Australia with «au", or the Thai label for business 
with "com". 

2- Substitut ion of Base Equivalent Characters 

English domain names are case insensitive, so 
15 lowercase Latin should be replaced with uppercase. Other 
languages may have different preferences. Defining the sets 
of equivalent characters can be left to implementors, and 
specialists in that language. 

3 - Encoding, Splitting and Control codes 
The Internet standard RFC1035 specifies that domain 

names have an overall limit of 255 octets, and that each 
label has a limit of 63 octets. Currently, labels only 
contain ASCII characters 'A'-'Z', ^a'-^z', ^0'- x 9' and x -' 

It is. possible in future that labels could be made of 
8bit (ASCII, IS08859), 16bit (UCS-2), 32bit (UCS-4), or 
variable length characters (UTF-8, UTF-7) . Labels could 
even be made of other data, such as bitmaps (pictures), or 
sound data. 

For the representation of multilingual domain names, 
the preferred method of encoding is UCS-2 to Hex in ASCII, 
as it is fully compatible with existing DNS tools. 

Since each UCS-2 character maps to 4 ASCII characters, 
any label that is longer than 15 UCS-2 characters must be 
split, so that it fits into the maximum label length of 
35 63 octets. It is further recommended that labels which are 
15 UCS-2 characters long, should be split with a coded 
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part. This allows for separation of control of 

the common part of a shared domain label, as will be 
further explained below. 

There may be several businesses that share the first 
5 part of their name. Rather than giving control of the 
common part to one of these businesses, it is possible to 
give control to a neutral third party, such as the 
superdomain. 
For example: 
10 [Traveller's Rescue]. <AU> 

[Traveller's Rest] .<AU> , 

and 

[Traveller's Res].<AU> 
when split and prefixed would become 
15 <X-2> [cue] .<X-1> [Traveller's Res].<AU> , 

<X-2>[t] .<X-1> [Traveller's Res].<AU>, 

and 

<X-2>[] .<X-1> [Traveller's Res] .<AU> 
Control of the common domain <X-1> [Traveller' s Res].<AU> 
20 could be given to <AU>, or shared by the organisations. 
Each organisation can have control over its <X-2> 
■ subdomain. 
4 ■ Adding pseudo-root domain name 

A pseudo-root domain name is added to the coded domain 
25 name, for the reasons mentioned in "Pseudo-Root Names''. 
Name servers for the pseudo-root may be specialised for the 
processing of names in a particular encoding, or language. 

The recommended pseudo-root domain- name to add is 
<X-X>.<NET>. That is, "X-X.NET." . 
30 5. Presenting coded form of name 

A converter may have to present the coded form in a 
way which is useable by applications. The traditional way 
is specified in RFC1035 - labels separated by dots, with 
the highest level label to the right. 
35 Converters that query the DNS themselves, may not need 

to concatenate the labels into a contiguous string. 
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Example of con verting Multilingual Domain Hfme 

The following provides an example of the domain name 
conversion process of the preferred embodiment. 
"Glebe, Traveller's Rescue, Australia" 
3 1. Parsing, and Substitution of Special Words 

-> [Glebe] . [Traveller' s Rescue] .<AU> 
2. Substitution of Base Equivalent Characters 
-> [GLEBE] . [TRAVELLER'S RESCUE] .<AU> 

.3. Encoding, Splitting and Control codes 
Encoding UCS-2 characters as Hex in ASCII 

-><0047004c004500420045>.<00540052004100560045004C004C 
00450052002700530020005200450053004300550045>.<AU> 
Splitting and Prefixing with Control codes 

-><X z 00047004c004500420045>.<X-2004300550045>.<X-10054 

15 0052004100560045004C004C00450052002700530020005200450053>.< 
AU> 

4. Adding pseudo-root domain name 

-><X-00047004c004500420045>.<X-2004300550045>.<X-10054 

0052004100560045004C004C00450052002700530020005200450053>.< 
20 AU>.<X-X>.<NET>. 

5. Presenting coded form of name 

->X-00047004c004500420045.X-2004300550045.X-1005400520 

04100560045004C004C00450052002700530020005200450053.AU. X-X. 
NET. 

25 Email 

Email mailboxes and addresses can use a larger part of 
the ASCII character set than DNS. Normally, an email 
address comprises a mailbox name (local part) at a domain 
name. 

30 a multilingual email address could be formed in some 

other way, using the languages own symbols for addressing. 
For instance, [Jason at Home, Australia] instead of 
Jason@HOME.AU. Converters or mail programs are responsible 
for processing the email addresses correctly. Multilingual 

35 addresses could be processed in a number of ways: 
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arsed, coded and sent to a mailbox at a domain 

Parsed 

.-> [ Jason] @ [Home] . <AU> 
Coded 

5 -> <X-0> [Jason] @<X-0> [HOME] . <AU> . <X-X> . <NET> . 

. 2. Coded, and sent to a converting mail exchanger 
-> <X-0> [Jason at Home, Australia] @<MAIL> „ <X-X> . <NET> . 
3. Coded, and resolved by DNS 

-> <X-0>[Jason at Home, Australia] .<MAIL>.<X-X>.<NET>. 

10 4. Parsed, coded, and resolved by DNS 

-> <X-0> [Jason at Home] . <AU> . <MAIL> , <X-X> . <NET> . 
Universal Resource Locators (URLs) 

URLs encompass file names, newsgroups, domain names, 
email, and many other names. A larger part of the ASCII 

15 character set is available for names, and encoding of 
octets is provided for. However, the schemes that URLs 
encompass remain restricted in the characters they can use, 
so there is a need for coded multilingual URLs. 
Substitution of special words and symbols 

20 URLs are currently defined for the US-ASCII character 

set. Multilingual users may prefer to use symbols from 
their own language, in place of the specific scheme names, 
reserved and special characters. Converters would then 
parse these symbols and replace them with the US-ASCII 

25 symbols. 

For instance [Secure Web] -> <https:> or [web] -> <http:>. 

Schemes that use Internet protocols, are formatted as: 
xx <scheme> : //<user> : <password>@<host> : <port>/<url-path>" . 
Multilingual scheme should be parsed into a coded form like 
30 this. Conversion of components using the UCS-2 as Hex in 
ASCII can be applied to the user name, password, and host 
name (which is a usually a domain name), and components of 
the url-path. 

Multilingual port numbers should be converted into 
35 synonymous ASCII number, if written as a non-ASCII number 
such as in Chinese or Sanskrit numerals. 
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The url-path may be further parsed, and broken down 
into special and reserved characters, path names, file 
names, search, argument names, and argument values. 

It is left to implementors of converters to elect the 
5 characters and symbols in their language, that will 
substitute for scheme names, special and reserved 
characters . 

Some Examples - parsed and substituted, but not coded. 
[Mail: Jason at Home, Australia] 

10 ~> <mailto:// [Jason] @ [Home] -AU.X-X.NET 

[News: English, Patent Law] 

-> <news://alt.x-.en. [law] . [patent] > 

[Secure Web: OneAccount - login (Jason) ] 

->http: // [OneAccount] .X-X.NET. / [login] .cgi? [login] -[ Ja 

15 son]> 

[Local File: Patents - Multilingual Test, program] 
-Xfile: //localhost/ [Patents] / [Multilingual Test] .exe> 
Forms of Implementation 

The method of the preferred embodiment can take many 
20 different forms of implementation, for example, as follows: 
Stand Alone Converter 

This form takes in a multilingual name, and outputs a 
coded name as an ASCII string, or some other 
representation. The converter may be created to work for 
25 particular kinds of names, such as URLs or email addresses, 
and/or to work with particular applications, such as web 
browsers. 

Converters may have controls to, or automatically, 
send the ASCII string to relevant applications. They may 
30 allow a user to copy and paste to and from their 
applications. 

Incorporated into applications 

Alternatively, the conversion function may be 
incorporated into the applications such as browsers, 
35 editors, email, telnet, ftp, and news. 
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Plug-in or add-on to application 

The converter may be a program or library that plugs- 
in or adds onto the existing applications, providing the 
application with the added multilingual name functionality. 
Application loadable control 

The converter may take the form of a control that the 
application can use. Examples are Web pages that include 
javascript, java controls, or Active-X controls. 

Such controls and plug-ins may replace, or overlay a 
browsers current URL entry field, with a multilingual name 
field. This field both displays the multilingual name, and 
allows entry of multilingual URLs. Coded names are passed 
back and forth from converter to browser. 
Web Page interfaces to converter 
15 A converter may run on a web server, with access to 

the converter being provided through multilingual web 
pages. Users access a multilingual URL/domain name 
service such as "http://X-X.NET/". If their browser 
requests a particular language, a web page in that language 
is provided (if available) , otherwise a multilingual page 
is provided. 

The web page can typically provide a form, so that 
the user may type in a multilingual URL. Users may select 
common parts from lists such as the encoding scheme, 
25 organisation type, and country. These lists may have 
defaults on a per user, or per language basis. 

When the multilingual URL form is submitted, the 
converter server has several options: 

1. returning the coded URL as an ASCII string, which 
30 the user may link to, or use as they please. 

2. providing a redirection to the coded URL. 

3. presenting a frame view, where one frame contains 
the requested coded URL, and another contains a 
multilingual URL form, for typing other URLs. 

35 Multilingual Registries may also provide a web 

interface to provide for registration of multilingual 
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names, such as domain names and email addresses. 
Converter packag ed with other facilities 

Converters may be packaged with other facilities. For 
instance, a program may parse a multilingual name in 
several ways, and perform several searches such as DNS 
lookup, whois search, and web page search. It might present 
information to a user, or return specific information to a 
client application. 
Resolvers 

The resolver accepts the multilingual name direct from 
applications, but then converts it before querying name 
servers. Resolvers may query name servers for both the 
binary and sub-ASCII representations of the multilingual 
domain name. The resolver may also try variations on the 
name . 

Name Servers 

When performing recursive queries, the name server 
accepts sub-ASCII or binary multilingual domain names; and 
queries other name servers with sub-ASCII or binary 
Multilingual domain names. 

The name server may convert from binary name to 
another format before querying its database and may return 
answers for either form. 

In responses, the name server may respond with 
additional records for binary or sub-ASCII forms (including 
CNAME and A records) that match, or are variations of, the 
queried name. For example, if there are minor spelling 
errors, if they differ only in case, or their base 
equivalent characters are the same. 
Databases 

Databases may keep records in binary or sub-ASCII 
form. Conversion between them, and conversion for client or 
server programs may be required. 
Other areas rvF application 

The principle of having the first 3 characters in a 
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field represent the encoding scheme can be applied 
generally. This can be applied to directory services, such 
as Whois, LDAP, and to search engines, and to databases. 

It can therefore be generally seen that the preferred 
5 embodiment provides for the representation of Multilingual 
characters, in more limited character sets. In particular, 
the process includes converting UCS2 to Hex in ASCII, 
applied to internet names used in the Domain Name System 
(DNS), email, news and Uniform Resource Locaters. For DNS, 

10 a multilingual domain label is represented in one or more 
sub-ASCII labels. The first 3 characters identify the 
label's encoding scheme, leaving a maximum of 60 sub-ASCII 
characters for encoded data in each domain name label. 

In UCS-2 to Hex in ASCII encoding the first and second 

15 characters is the name of the scheme .'X-'; and the third 
character identifies the part of the split multilingual 
label. The name of the pseudo root server "X-X.NET" is 
attached to the sub-ASCII representation of the 
multilingual domain name. The pseudo root server is 

20 visible in the current domain name space. For email, the 
first three characters of the local-part identify the 
local-part's encoding scheme. The domain name follows the 
rules for DNS. 

Alternatively, the entire email address is encoded, 
25 and sent to the relevant mail server, exchanger or gateway 
for processing or forwarding. For URLs, the first three 
characters of each component (name, label, argument) in the 
URL identifies the encoding scheme. 

The encoding and representation can be implemented in 
30 the form of various software devices, such as upgrades or 
add ons to existing software, incorporation in new 
software, stand-alone applications, databases, servers, 
clients, resolvers, name servers. 

The first three characters identify the encoding 
35 scheme to a converter, so that it may display the name in 
the right character set. These characters mean nothing to 
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existing DNS, E.mail and web systems, simply identifying 
the name of a domain, mailbox, file or other data. Hence 
variations utilising different encoding identifiers can 
also be easily used. 
5 This scheme can be designed for temporary use, up 

until applications and databases, (including name servers 
and resolvers) become compliant with a multilingual 
character set such as ISO10646 or Unicode. 

It is further possible under this scheme to have 
10 several pseudo roots. This allows multiple registries to 
run, specialising in particular languages. However, It is 
recommended that one pseudo root be selected, with 
registries sharing the pseudo root's database. 

It would be further appreciated by a person skilled in 
the art that numerous variations and/or modifications may 
be made to the present invention as shown in the specific 
embodiments without departing from the spirit or scope of 
the invention as broadly described. The described present 
embodiments are, therefore, to be considered in all 
respects to be illustrative and not restrictive. 
Glossary 

The following terms are hereinafter defined for ease 
of understanding: 

Multilingual Name - made of non-ASCII characters, may 
25 be a string of characters, or several labels or fields. 

This specifically includes, and is not limited to, domain 
names, user names, file names, email addresses, newsgroups 
and Universal Resource Locators (URLs). 

Coded Name - a string, or fields, of ASCII characters 
that represent a Multilingual name in some encoding. 

Converter - any program that converts from one 
representation of names to another. Especially, converting 
from UCS-2 to Hex in ASCII and back. Converters may 
incorporate resolvers, and other functions such as 
35 substitution for equivalent characters. 
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. ASCIT^- A character set that contains the English 
Alphabet, Arabic Numerals, punctuation marks and some 
computer control codes. There a several varieties of ASCII 
Sub-ASCII - The limited subset of the ASCII character 
5 set that has been used in domain names: *A'-*Z', x a'-*z', 
*0'-*9' , and (dash) . 

UCS - Universal multi-byte Character Set encodings of 
ISO10646 and Unicode, which cover most living languages. 
UCS-2 is 2 bytes (16 octets), UCS-4 is 4 bytes. 
10 Equivalent Characters - characters that are mapped to 

the same base character by a program. 
In English % A' and *a' differ only in case. To case 
insensitive programs, such as DNS, they are equivalent. 
In other languages, equivalent characters may differ in 
15 other ways. Eg. In Greek, there are two lowercase sigmas; 
one for use at the end of a word. Developers of programs 
for different language markets are specialists in these 
areas; they decide on which characters are equivalent. 

Domain Name - a name upto 255 octets made of several 
20 labels, one for each level in the hierarchy, "www.x-x.net." 
is a domain in the "x-x.net." domain in the "net." domain. 
The DNS store information related to domain names. 
Label - part of a domain name, upto 63 octets. 
DNS - The Domain Name System . A distributed database 
25 that is accessed by resolvers asking name servers. The DNS 
stores computer's names, IP addresses, and more. 
See RFC 1034, 1035 and others. 

IP address - A 4 byte internet network address. 
Resolver - a program that applications use to query 
30 the DNS. A resolver in turn asks Name Servers for 
information. 

Name Server - a name server has information about its 
domain that it gives to resolvers and other name servers. 
If it doesn't know it may query other name servers. 

Root Name Servers - the name servers at the top of all 
hierarchies . 



35 



SUBSTITUTE SHEET (Rule 26) (RO/AU) 



W ° 99/19814 PCT/AU98/00849 



- 20 - 

Pseudo^Root Name Servers - some application may add a 
predetermined name to all of their domain name queries, 
making it seem as if that name server is at the top of all 
hierarchies . 

RFC - Request for Comments documents describe how the 
internet works. The Internet Engineering Task Force draws 
internet standards from the list of RFCs . 
Introduction to the Doma in Name S ystem mMfil 
By way of introduction to the internet's Domain Name 
System, we illustrate with an example. 

When a user wants to view a web page, they may type in 
or select it's URL. For example, a superannuation web page 
URL is "http : //w ww . superannuation . net/index . htm " . 
" www.superannuation.net " is a domain name, that is the name 
of the computer on which the page is kept. That computer's 
IP address (internet number) must be found to get the page. 
This is done by asking the DNS. 

The web browser asks a DNS Resolver to find the IP 
address of the domain name. The Resolver asks the local 
name server for the address. If the local name server 
doesn't know, it then tracks down the address by asking 
other name servers. The local name server asks the net,, 
domain name server where the superannuation.net. domain 
name server is. Then it asks this subdomain name server for 
the IP address of the domain name www.superannuation.net , 
which is 105.42.3.5 (just an example address). ' ~ 

The local name server then tells the resolver the IP 
address, which in turn informs the web browser. The web 
browser now asks the computer at that IP address for the 
web page via http: " //www . superannuation . net /index . htm " . 
Internet Applications such as web browsers, ftp, telnet and 
email programs all use resolvers to ask the DNS for the 
address of domain names. Sensible domain names are easier 
for people to remember than IP addresses; when they are in 
their own language. To date, DNS implementations have 
required names to be in a small subset of ASCII: the 
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letters A-2?r digits 0-9, and the dash 

Internet standard documents are readily available on 
the Internet. The most pertinent to this patent application 
is RFC1035 : Domain Names - Implementation and 



format of names in detail. 

The DNS specification RFC1035, with further updates 
and clarifications, state that domain name labels may 
contain up to 63 octets of binary data. It is suggested 
10 that the names be made from the characters A-Z, 0-9 and - 
dash, a restricted subset of US ASCII , so that legacy 
applications keep working. 

Until all internet applications and protocols 
(including resol vers, . name servers, and databases) are able 
15 to handle binary labels, it is desirable to represent 
binary labels in this subset of ASCII, especially 
multilingual domain names. 
Existing RFCs and Drafts 



20 internet-drafts are available from the Internet Engineering 
Task Force at http://ietf.org/". 

http: //dxcoms . cern . ch/wwwcs/public/ip/draf tslist . html 
Although these documents frame the way in which the 
internet should work, a number of recommendations have not 
25 been adopted, nor implemented. 

RFC882 Format of ARPA Internet Text Messages defines 
internet mail, and specifies the format of email addresses. 
RFC1035 Domain Names - Implementation and Specification 
defines the DNS protocol, and specifies a format for domain 
30 names as a sequence of labels separated by dots. Labels 
begin with a letter, and may contain characters from 
'A'-*Z\ *a'-*z', *0'-*9' and dash. 

RFC1123 Requirement for Internet Hosts allows domain 
labels to begin with letters or numbers. 
35 RFC1738 Uniform Resources Locators (URL) specifies the 

format of URLs, in a subset of US-ASCII that permits binary 



5 



Specification 



which describes how DNS works, and the 



By way of background, a number of RFC documents, and 
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data as octets represented by %HH, where H is 0-9, A-F more 
commonly known as 'Hex in ASCII ' . 

RFC2130 Character Set Workshop Report recommends 
ISO10646 as base character set for internet also says DNS 
5 should stay in limited ASCII format. 

RFC2152 UTF-7 A mail safe transformation format for 
Unicode specifies methods for encoding Unicode into mail 
messages, but not for mail addresses, domain names, nor 
URLs. 

10 RFC2181 Clarifications to the DNS Specification 

clarifies that 'any binary string whatever can be used as 
the label' . 

RFC2070 Internationalisation of the Hypertext Markup 
Language is one of many RFCs, that describe multilingual 
15 documents, but do not address the issue of DNS, email or 
URLs. 

RFC1468 for Japanese, RFC1557 for Korean, RFC1922 for 
Chinese specify encodings for these character sets, that 
begin with escape sequences. 

20 Jt would be appreciated by a person skilled in the art 

that numerous variations and/or modifications may be made 
to the present invention as shown in the specific 
embodiments without departing from the spirit or scope of 
the invention as broadly described. The present 

25 embodiments are, therefore, to be considered in all 
respects to be illustrative and not restrictive. 
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1. A method for providing for multilingual names for 
use on the Internet, related networks and computers, said 
method comprising the steps of: 

5 forming an initial multilingual name in a multilingual 

format; 

mapping said multilingual name to a corresponding 
coded name in a reversible manner, said coded name 
comprising a restricted subset of the ASCII character set; 
10 utilising said corresponding coded name on the 

Internet and related networks in place of said 
multilingual name. 

2. A method as claimed in claim 1 wherein said 
mapping step further comprises adding a predetermined 

15 pseudo-root name to said corresponding coded name. 

3. A method as claimed in any preceding claim 
wherein said mapping includes converting said multilingual 
name to a corresponding hexadecimal coded name and 
representing said hexadecimal coded name in an ASCII form. 

20 4. A method as claimed in any preceding claim 

wherein said corresponding coded name is divided into a 
series of labels with each label having a predetermined 
portion comprising a control code for said label. 

5. A method as claimed in any preceding claim 
25 wherein a multilingual name is parsed or broken down into 

components. 

6. A method as claimed in any preceding claim 
wherein components a multilingual name that have special, 
reserved, or schematic meaning are replaced with synonymous 

30 components in the coded name. 

7. A method as claimed in any preceding claim 
wherein the characters of a multilingual name are replaced 
with their base equivalent characters. 

8. A method for providing in domain name systems, 
35 answers that contain additional information about names 

that are similar to names in questions to the domain name 
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system, the names being multilingual, coded or ordinary 
ASCII. 

9. A method as claimed in any preceding claim 
wherein said method is used in applications that also 
directly or indirectly use internet protocols or services. 

10. A method as claimed in any preceding claim 
wherein said method is used in internet applications, 
utilities, resources or services. 

11. A method as claimed in any preceding claim 
wherein a multilingual name is represented by a coded name 
for the purposes of sending, receiving or otherwise 
processing one of email, talk, chat, IRC, the coded name 
being a name for a user of a program, computer system, or 
network. 
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